imagenet data
Benchmark data to study the influence of pre-training on explanation performance in MR image classification
Oliveira, Marta, Wilming, Rick, Clark, Benedict, Budding, Céline, Eitel, Fabian, Ritter, Kerstin, Haufe, Stefan
Convolutional Neural Networks (CNNs) are frequently and successfully used in medical prediction tasks. They are often used in combination with transfer learning, leading to improved performance when training data for the task are scarce. The resulting models are highly complex and typically do not provide any insight into their predictive mechanisms, motivating the field of 'explainable' artificial intelligence (XAI). However, previous studies have rarely quantitatively evaluated the 'explanation performance' of XAI methods against ground-truth data, and transfer learning and its influence on objective measures of explanation performance has not been investigated. Here, we propose a benchmark dataset that allows for quantifying explanation performance in a realistic magnetic resonance imaging (MRI) classification task. We employ this benchmark to understand the influence of transfer learning on the quality of explanations. Experimental results show that popular XAI methods applied to the same underlying model differ vastly in performance, even when considering only correctly classified examples. We further observe that explanation performance strongly depends on the task used for pre-training and the number of CNN layers pre-trained. These results hold after correcting for a substantial correlation between explanation and classification performance.
MIT researchers find 'systematic' shortcomings in ImageNet data set
MIT researchers have concluded that the well-known ImageNet data set has "systematic annotation issues" and is misaligned with ground truth or direct observation when used as a benchmark data set. "Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for," the researchers write in a paper titled "From ImageNet to Image Classification: Contextualizing Progress on Benchmarks." "We believe that developing annotation pipelines that better capture the ground truth while remaining scalable is an important avenue for future research." When the Stanford University Vision Lab introduced ImageNet at the Conference on Computer Vision and Pattern Recognition (CVPR) in 2009, it was much larger than many previously existing image data sets. The ImageNet data set contains millions of photos and was assembled over the span of more than two years. ImageNet uses the WordNet hierarchy for data labels and is widely used as a benchmark for object recognition models.
Inside the 1TB ImageNet data set used to train the world's AI: Nude kids, drunken frat parties, porno stars, and more
Special report ImageNet – a data set used to train AI systems around the world – contains photos of naked children, families on the beach, college parties, porn actresses, and more, scraped from the web to train computers without those individuals' explicit consent. The library consists of 14 million images, each placed into categories that describe what's pictured in each scene. This pairing of information – images and labels – is used to teach artificially intelligent applications to recognize things and people caught on camera. The database has been downloaded by boffins, engineers, and academics to train hundreds if not thousands of neural networks to identify stuff in photos – from assault rifles and aprons to magpies and minibuses to zebras and zucchinis, and everything in between. In 2012, the data set was used to build AlexNet, heralded as a breakthrough development in deep learning since it marked the first time a neural network outperformed traditional computational methods at object recognition in terms of accuracy.
Progressive Weight Pruning of Deep Neural Networks using ADMM
Ye, Shaokai, Zhang, Tianyun, Zhang, Kaiqi, Li, Jiayu, Xu, Kaidi, Yang, Yunfei, Yu, Fuxun, Tang, Jian, Fardad, Makan, Liu, Sijia, Chen, Xiang, Lin, Xue, Wang, Yanzhi
Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alternating Direction Method of Multipliers), a powerful technique to deal with non-convex optimization problems with potentially combinatorial constraints. Motivated by dynamic programming, the proposed method reaches extremely high pruning rate by using partial prunings with moderate pruning rates. Therefore, it resolves the accuracy degradation and long convergence time problems when pursuing extremely high pruning ratios. It achieves up to 34 times pruning rate for ImageNet dataset and 167 times pruning rate for MNIST dataset, significantly higher than those reached by the literature work. Under the same number of epochs, the proposed method also achieves faster convergence and higher compression rates. The codes and pruned DNN models are released in the link bit.ly/2zxdlss
AI can be sexist and racist -- it's time to make it fair
When Google Translate converts news articles written in Spanish into English, phrases referring to women often become'he said' or'he wrote'. Software designed to warn people using Nikon cameras when the person they are photographing seems to be blinking tends to interpret Asians as always blinking. Word embedding, a popular algorithm used to process and analyse large amounts of natural-language data, characterizes European American names as pleasant and African American ones as unpleasant. These are just a few of the many examples uncovered so far of artificial intelligence (AI) applications systematically discriminating against specific populations. Biased decision-making is hardly unique to AI, but as many researchers have noted1, the growing scope of AI makes it particularly important to address.
Keras Tutorial : Using pre-trained ImageNet models
Next, we will learn how to use pre-trained models trained on large datasets like ILSVRC, and also learn how to use them for a different task than it was trained on. ImageNet is a project which aims to provide a large image database for research purposes. It contains more than 14 million images which belong to more than 20,000 classes ( or synsets). They also provide bounding box annotations for around 1 million images, which can be used in Object Localization tasks. It should be noted that they only provide urls of images and you need to download those images.
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
Shankar, Shreya, Halpern, Yoni, Breck, Eric, Atwood, James, Wilson, Jimbo, Sculley, D.
Modern machine learning systems such as image classifiers rely heavily on large scale data sets for training. Such data sets are costly to create, thus in practice a small number of freely available, open source data sets are widely used. We suggest that examining the geo-diversity of open data sets is critical before adopting a data set for use cases in the developing world. We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales. These results emphasize the need to ensure geo-representation when constructing data sets for use in the developing world.